Search results for "Algorithms and data structures"
showing 3 items of 3 documents
Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning
2014
Abstract Motivation: Information-theoretic and compositional analysis of biological sequences, in terms of k-mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k-mers in chromatin organization and nucleosome positioning is particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. Results: We present the first analysis on the role of k-mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhau…
Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis
2012
AbstractThe advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of pre…
Stability-Based Model Selection for High Throughput Genomic Data: An Algorithmic Paradigm
2012
Clustering is one of the most well known activities in scien- tific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is the model selection problem, i.e., the identifi- cation of the correct number of clusters in a dataset. In the last decade, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained promi- nence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of predic- tion, but the slowest in terms of time. Unfortunately…